learning theory - bias-variance tradeoff

some terminology in learning theory.

看learning theory的时候,关注的是这个hypothesis怎么样, 而不是”specific parameterization of hypotheses or whether it is linear classification”
so we define hypothesis class H

  • training error/empirical risk/empirical error of hypothesis h
    • training set have size N,
    • assumption 1: (one of PAC assumption)
      • training examples $(x^{(i)},y^{(i)})$ are drawn iid from some probability distribution D
        $\hat{e}(h) = 1/N \sum_{i=1,2…N} (e_i)$
        then we can have
  • generalization error

    • DEF: under assum1, it is the probability that, if we now draw a new example (x, y) from the distribution D, h will misclassify it.
    • have two component: bias and variance
  • training error

    • the process of minimizing training error: empirical risk minimization (ERM)
      • think of ERM as the most “basic” learning algorithm
      • logistic regression is approxi of ERM
  • expected train error

    • by taking expectation over all possible training datasets of size N.
    • which is means we train for infinite datasets and take the average, but we cant, so we estimate by say, having m training datasets in size N then avarage the training error of each set
  • in-sample test error

    • for one given test-pair
  • test error

    • taking expectation over the test-data(all in-sample test error)
  • expected test error

    • average over all possible training data of size N, again we cant, so esitimate it by our limited test set